T-REX: A Domain-Independent System for Automated Cultural Information Extraction
نویسنده
چکیده
RDF (Resource Description Framework) is a web standard defined by the World Wide Web Consortium. In RDF, we can define schemas of interest. For example, we can define a schema about tribes on the Pakistan-Afghanistan borderland, or a schema about violent events. An RDF instance is a set of facts that are compatible with the schema. The principal contribution of this paper is the development of a scalable system called T-REX (short for “The RDF EXtractor”) that allows us to extract instances associated with a user-specified schema, independently of the domain about which we wish to extract data. Using T-REX, we have successfully extracted information about various aspects of about 20 tribes living in the Pakistan-Afghanistan border. Moreover, we have used T-REX to successfully extract occurrences of violent events from a set of 80 news sites in approximately 50 countries. T-REX scales well – it has processed approximately 45,000 web pages per day for the last 6 months.
منابع مشابه
T-Rex: A Flexible Relation Extraction Framework
In the wake of the explosive growth in the use of the computer as a communication device, has come a need for systems that help people cope with the sheer volume of information available. It is universally known that the Internet contains vast amounts of unstructured documents, but the same is also true for large organizations like publishing companies, government departments, airplane manufact...
متن کاملUsing Natural Language Processing to Improve Accuracy of Automated Notifiable Disease Reporting
We examined whether using a natural language processing (NLP) system results in improved accuracy and completeness of automated electronic laboratory reporting (ELR) of notifiable conditions. We used data from a community-wide health information exchange that has automated ELR functionality. We focused on methicillin-resistant Staphylococcus Aureus (MRSA), a reportable infection found in unstru...
متن کاملCultural Frame and Translation of Pronominal Adverbs in Legal English
This paper explores the relationship between cultural knowledge and the specific meaning of a pronominal adverb in legal English where Chinese translators need to get the correct translation in their venture into translating the language of law. On the one hand, relying on the relevant legal cultural knowledge functioning as domain-general reference within a community or jurisdiction, tra...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملSecond-Order Statistical Texture Representation of Asphalt Pavement Distress Images Based on Local Binary Pattern in Spatial and Wavelet Domain
Assessment of pavement distresses is one of the important parts of pavement management systems to adopt the most effective road maintenance strategy. In the last decade, extensive studies have been done to develop automated systems for pavement distress processing based on machine vision techniques. One of the most important structural components of computer vision is the feature extraction met...
متن کامل